k-link EST clustering: evaluating error introduced by chimeric sequences under different degrees of linkage
نویسندگان
چکیده
MOTIVATION The clustering of expressed sequence tags (ESTs) is a crucial step in many sequence analysis studies that require a high level of redundancy. Chimeric sequences, while uncommon, can make achieving the optimal EST clustering a challenge. Single-linkage algorithms are particularly vulnerable to the effects of chimeras. To avoid chimera-facilitated erroneous merges, researchers using single-linkage algorithms are forced to use stringent sequence-similarity thresholds. Such thresholds reduce the sensitivity of the clustering algorithm. RESULTS We introduce the concept of k-link clustering for EST data. We evaluate how clustering error rates vary over a range of linkage thresholds. Using k-link, we show that Type II error decreases in response to increasing the number of shared ESTs (ie. links) required. We observe a base level of Type II error likely caused by the presence of unmasked low-complexity or repetitive sequence. We find that Type I error increases gradually with increased linkage. To minimize the Type I error introduced by increased linkage requirements, we propose an extension to k-link which modifies the required number of links with respect to the size of clusters being compared. AVAILABILITY The implementation of k-link is available under the terms of the GPL from http://www.bioinformatics.csiro.au/products.shtml. k-link is licensed under the GNU General Public License, and can be downloaded from http://www.bioinformatics.csiro.au/products.shtml. k-link is written in C++.
منابع مشابه
Numerical modeling of links behavior in eccentric bracings with dual vertical links
Configuration and geometry of bracing systems affect the seismic performance of structures significantly. Recently, the current authors have introduced a new configuration for eccentric bracing of structural frames that may be assumed as the combination of inverted Y-type and rotated K-type EBFs. The resulted braced frame is called EBF-DVL, consisting of two vertical links attached together by ...
متن کاملEvaluating the Significance of Global and Local Features in Expressed Sequence Tag: A Clustering Quality Perspective
Clustering of expressed sequence tag (EST) plays an important role in gene analysis. Alignment-based sequence comparison is commonly used to measure the similarity between sequences, and recently some of the alignment-free comparisons have been introduced. In this paper, we evaluate the role of global and local features extracted from the alignment free approaches i.e., compression-based method...
متن کاملClustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملESTpass: a web-based server for processing and annotating expressed sequence tag (EST) sequences
We present a web-based server, called ESTpass, for processing and annotating sequence data from expressed sequence tag (EST) projects. ESTpass accepts a FASTA-formatted EST file and its quality file as inputs, and it then executes a back-end EST analysis pipeline consisting of three consecutive steps. The first is cleansing the input EST sequences. The second is clustering and assembling the cl...
متن کاملEHPnet: Doctors and the Environment
We present a web-based server, called ESTpass, for processing and annotating sequence data from expressed sequence tag (EST) projects. ESTpass accepts a FASTA-formatted EST file and its quality file as inputs, and it then executes a back-end EST analysis pipeline consisting of three consecutive steps. The first is cleansing the input EST sequences. The second is clustering and assembling the cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 25 شماره
صفحات -
تاریخ انتشار 2009